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Abstract. This paper focuses on the size-biased permutation of n independent and 
identicaUy distributed (i.i.d) positive random variables . Our setting is a finite di- 
mensional analogue of the size-biased permutation of ranked jumps of a subordinator 
studied in Perman-Pitman-Yor (PPY) |27 , as well as a special form of induced order 
statistics [sjjs]. This intersection grants us different tools for deriving distributional 
properties. Their comparisons lead to new results, as well as simpler proofs of exist- 
ing ones. Our main contribution, Theorem [19] in Section [5] describes the asymptotic 
distribution of the last few terms in a finite i.i.d size-biased permutation via a Poisson 
coupling with its few smallest order statistics. 



1. Introduction 

Let (xi,X2,...) be a positive sequence with finite sum t = Yli^i^i- size-biased 
permutation (s.b.p) is the same sequence presented in a random order (xg-i, Xg-j, ■ ■ ■), where 

P((Ti = i) = and for k distinct indices ii, . . . ,ik, 

= ikWi = ii, . . . , (Tk-i = ik-i) = —1 r- (1) 

t [Xi^ -|- . . . + j 

An index i with bigger 'size' Xi tends to appear earher in the permutation, hence the name 
size-biased. Size-biased permutation of a random sequence is defined by conditioning on 
the sequence values. 

A brief history. One of the earhest occurences of size-biased permutation is in social 
choice theory. Luce 22 studied distributions on the permutation on n letters given by ([T]) 



as a function of the Xj's. These are the relative scores or desirabilities of the candidates, 
to be inferred through observing multiple rankings. This ranking model is now known as 
the Plackett-Luce model and it has wide applications [7,30,34 



Around the same time, biologists in population genetics were interested in inferring the 
distribution of alleles in a population through sampling. In these applications, Xi is the 
abundance and Xi/t is the relative abundance of the i-th species |T2|| . Size-biased permuta- 
tion models the outcome of successive sampling, where one samples without replacement 
from the population and records the abundance of newly discovered species in the order 
that they appear. To account for the occurrence of new types of alleles through muta- 
tion and migration, they considered random abundance sequences and did not assume 
an upper limit to the number of possible types. Species sampling from random infinite 
sequence is sometimes known as size-biased random permutation, a term coined by Patil 



and Taillie [26j. The earliest work along this vein is perhaps that of McCloskey 24 , who 
obtained results on the size-biased permutation of ranked jumps in a certain Poisson point 
process (p.p.p). The distribution of this ranked sequence is now known as the Poisson- 
Dirichlet PD{0,6), and the distribution of its size-biased permutation is the GEM{0,6)] 
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see Section 



2.2 for their definitions. This work was later generahzed by Perman, Pitman 
who studied size-biased permutation of ranked jumps of a subordinator; see 



and Yor 27 
Section 12 

The finite combinatorial version of size-biased sampling from PD{0,6) is the Ewens 
sampling formula (Tsj. Kingman [20] wrote: 'One of the most striking results of recent 
theoretical research in population genetics is the sampling formula enunciated by Ewens 
and shown by (others) to hold for a number of different population models'. In study- 



ing this formula, Kingman initiated the theory of partition structure 20 ,21 . Kingman 



showed that the Ewens sampling formula defines a particular partition structure by dele- 



tion of type; see 14 for recent developments. Subsequent authors have studied partition 
structures and their representations in terms of exchangeable random partitions, random 
discrete distributions, random trees and associated random processes of fragmentation and 
coalescence, Bayesian statistics and machine learning. See |29 and references therein. 



Organization. This paper focuses on finite i.i.d size-biased permutation, that is, the size- 
biased permutation of n independent and identically distributed (i.i.d) random variables 
from some distribution F on (0, oo). Our setting is a finite dimensional analogue of the 
size-biased permutation of ranked jumps of a subordinator studied in Perman-Pitman- 
Yor (PPY) [27], as well as a special form of induced order statistics (3||8]; see Section 
4.3| for a brief historical account of this field. We utilize this connection in our paper 



to arrive at new results. In Section [2] we study joint and marginal distribution of finite 
i.i.d size-biased permutation through a Markov chain. We draw connections between our 
settings and that of PPY |27|, and prove a converse of the stick-breaking representation 
of Patil and Taillie when F is gamma (Proposition |4]). Section |3] we show that finite 
i.i.d size-biased permutation is a form of induced ordered statistics, and use this fact 
to derive previous distributional results. Comparisons to corresponding statements in 



Section [2] lead to a new beta-gamma identity when F is gamma (Corollary 13). As the 
sequence length tends to infinity, we derive asymptotics of the last u fraction of finite i.i.d 
size-biased permutation in Section |4| and that of the first few terms in Section |5} 

Notations. We shall write gamma{a, A) for a Gamma distribution whose density at x 
is A'^x'^~^e~'^^/r(a) for x > 0, and beta{a, b) for the Beta distribution whose density at x 

is .^.,. x"~'^(l — x)''~'^ for X G (0, 1). For an ordered sequence Yn(k), k = 1, . . . ,n, let 
T{a)T{b) 

^n'^^W = Ynin — k + 1) be the same sequence presented in reverse. For order statistics, 
we write (^^(1), y^(2), . . .) for the increasing sequence, and {Y^{1),Y^{2), . . .) for the 
decreasing sequence. Throughout this paper we use (Xi, . . . ,X„) for the underlying i.i.d 
sequence, and (X„[l], . . . ,X„[n]) for its size-biased permutation. 



2. Connections to the work of Perman-Pitman-Yor 127 . 



PPY considered the size-biased permutation of the sequence of ranked jumps of a sub- 
ordinator T, that is, a non-decreasing process with no drift component, right continuous 
paths, stationary independent increments, T(0) = 0, and 

i 

for < s < 1, where {(crj,Xj),i > 1} is the countable random set of points of a Poisson 
point process (p.p.p) on (0, 00)^ with intensity measure dsK{dx). The Xj's are the jumps 
of the subordinator T. They assumed K{dx) = p{x) dx for some density p, A(0, 00) = 
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oo,A(l,oo) < oo, xA(dx) < oo. So T > is an infinitely divisible random variable 
with Levy measure A and no drift component in its Levy- Kbit chine representation (see, 
for example, ITqI §15]). 

Propositiori[T] states that the PPY setup is the limit in distribution of finite i.i.d s.b.p. 
This is an application of the principle that convergence of order statistics is equivalent to 
convergence of size-biased permutation; see [15] for a precise statement. 

Proposition 1 (Gnedin jis]). Let X = (X-^i^l), X'^{2), . . .) be the infinite sequence of 
ranked jumps of a subordinator, whose sum T = X]i>i^^(0 ^■^ positive, infinitely divis- 
ible, a.s. finite random variable whose Levy-Khitchine representation has no drift compo- 
nent. Let (X",n > 1) be an i.i.d positive triangular array, that is, X" = (X", . . . ,X^) 

where X" are i.i.d, a.s. positive, and = XlILi^^i" T as n oo. Then as n oo, 

s.b.p ofX"" ^ s.b.p ofX 
meaning convergence in distributoin of finite dimensional distributions. 
Proof. Recall that the sequence of decreasing order statistics (X"''~''(l), . . . ,X"''-''(n)) con- 



verges in distribution to X 19, §15]. Since T„,T > a.s. and -4 T, the normalized 



sequence (X"'-''(l)/T, . . . , X"''^n)/T) converges in distribution to X/T. The result follows 



from 15, Theorem 3]. □ 



One can obtain another finite version of the PPY setup by setting A(0, oo) < oo, 
but this can be reduced to finite i.i.d s.b.p by conditioning. Specifically, T is now a 
compound Poisson process, where the subordinator waits for an exponential time with 
rate A(0, oo) before making a jump X, whose length is independent of the waiting time 
and distributed as P(X < t) = A(0, t]/A(0, oo) |2|. If (Xi,X2,...) is the sequence of 
successive jumps of (T^, s > 0), then (Xi, X2, . . . , Xjy) is the sequence of successive jumps 
of (Ts, < s < 1), where X is a Poisson random variable with mean A(0, 00), independent 
of the jump sequence (Xi,X2, . . .). For X > 0, properties of the size-biased permutation 
of (Xi, . . . ,X7v) can be deduced from those of a finite i.i.d size-biased permutation by 
conditioning on X. 

2.1. Joint distribution, Markov property and stick-breaking. Recall that (Xi, . . . , X„) 
is an i.i.d sequence from distribution F, and (X„[l], . . . ,X„[n]) is its size-biased permu- 
tation. Let Tn-k = Xn[k + 1] + . . . + X„[n] denote the sum of the last n — k terms in an 
i.i.d size-biased permutation of length n. We now derive the joint distribution of the first 
k terms X„[l], . . . ,X„[/c] when F has density ui. The parallels in PPY are discussed in 
Section O 

Proposition 2 (Barouch-Kaufman yj). For 1 < k < n, let Uk be the density of Sk, the 
sum of k i.i.d random variables with distribution F. Then 



P(x„[i] edx^,...,Xn[k]e dxk) 

n\ 



{n-k)\ 



^ \ poo ^ 

TT Xj uiixj) dxj / Un-kis) T\{xj + ...+Xk + s)~^ ds. 

3=1 J 3=1 



(2) 



Proof. Let a denote the random permutation on n letters defined by size-biased per- 
mutation as in (1). Then there are ^^"'^-^1 distinct possible values for (ai, . . . , 0"^). By 



4 JIM PITMAN AND NGOC MAI TRAN 

exchangability of the underlying i.i.d random variables Xi, . . . , Xn, it is sufficient to con- 
sider (Ji = 1, . . ., (Tfc = k. Note that 

(n \ k 

(Xi, . . . , Xk) e dxi ... dxk, Xj e ds) = Un^ki-s) ds Y\ i^ii^j) dxj. 
j=k+i ) i=i 

Thus, restricted to ai = 1, . . ., 0"^ = k, the probability of observing . . . ,X„ 

dx\ . . . dxk and T„_fc G ds is precisely 

xi dxi X2dx2 Xkdxk r \ (tJ r \ J \ J 

Vn-kKs) J_J_z/i(Xj)(iXj ds. 



Xi + . . . + Xk + S X2 + . . . + Xk + S Xk + S 

(n-k) 



\j=l 

By summing over j^^^ziky, possible values for (di, . . . , 0"^), and integrating out the sum T„_fc, 
at g. □ 



we arrive 



Note that Xn[k] = Tn-k+i — Tn-k for /c = 1, . . . , n — 1. Thus we can rewrite ^ in terms 
of the joint law of (T„, T„_i, . . . , T„_fc): 

n\ f^'^t-t- \ 
P(T„ e dto, Tn-k e dtk) = 7 — TT TT -^——^lyiiti - tj+i) Vn-k{tk) dto... dtk. 

(3) 

Rearranging ([s]) yields the following result, which appeared as an exercise in [6| §2.3]. 

Corollary 3 (Chaumont-Yor |6j). The sequence (T„, T„_i, . . . , Ti) is an inhomogeneous 
Markov chain with transition probability 

P(T„_fe G ds\Tn-k+i = t) = {n-k + l) ^ -uiit- s) J^dii^ ds, (4) 

t l^n~k+l\t) 

for k = 1, . . . ,n — 1. Together with T^ = Sn, equation specifies the joint law in Q), 
and vice versa. 

An equivalent way to state Q is that for k > 1, conditioned on Tn-k+i = t, Xn[k] is 
distributed as the ffist size-biased pick out of n — k + 1 i.i.d random variables conditioned 
to have sum Sn-k+i = t. This provides a recursive way to generate a finite i.i.d s.b.p: 
first generate T„ (which is distributed as Sn). Conditioned on the value of T„, generate 
Tn-i via Q, let Xn[l] be the difference. Now conditioned on the value of T„_i, generate 
Tn~2 via Q, let X„[2] be the difference, and so on. Let us explore this recursion from 

a different angle by considering the ratio Wn,k '■= ^ and its complement, Wn,k = 

Tn~k+1 

T 

1 - Wn,k = — • For A; > 2, note that 

Tn-k+l 

^ = ^ • • ■ ^ = W^,]\Wn,.. (5) 

-'n— fe+1 ^n—k+2 



i=l 



The variables Wn,i can be interpreted as residual fractions in a stick-breaking scheme: 
start with a stick of length 1. Choose a point on the stick according to distribution Wn,i, 
'break' the stick into two pieces, discard the piece of length Wn^i and rescale the remaining 
half to have length 1. Repeating this procedure k times, and (|5]) is the fraction broken 
off at step k relative to the original stick length. 
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Together with T„ = Sn, one could use (5) to compute the marginal distribution for 
Xn[k] in terms of the ratios Wn,i- In general the Wn,i are not necessarily independent, 
and their joint distributions need to be worked out from ([4|). However, when F has gamma 
distribution, T„, Wn,i, • • • , Wn,k are independent, and (jsTleads to the following result of 
Patil and Taillie |2^. 



Proposition 4 (Patil- Taillie ^6]). If F has distribution gamma{a, A) for some a, X > 0, 
then Tn and the Wn,ii • • • , Wn,n-i in are mutually independent. In this case, 

X„[l]=7o/3i 

Xjjl - 1] = 70/3l/32 • • • Pn~2Pn-l 

Xn[n] = 7o/3i/32 • • • I3n-1 

where 70 has distribution gamma{an, X), (5k has distribution beta{a + l,{n — k)a), (5^ = 
1 — for 1 < k < n — 1, and the random variables 70, . . . , /3n are independent. 

Proof. By assumption Sk has distribution gamma{ak, A). One substitutes the density of 
gamma{ak, A) for z/^ in Q, and the result follows by direct computation. □ 



In the subordinator setting, McCloskey 24 proved the analogue of Proposition |4| and 
Perman-Pitman-Yor [27j proved the converse, see Proposition [6] below. Using the same 
idea, we obtain the following converse to Proposition |4| which appears to be new. 

Corollary 5. //T„ is independent of Xn[l]/Tn, then F is gamma{a, A) for some a, A > 0. 



Proof. Lukacs 23 proved that if X, Y are non-degenerate, positive independent random 
variables, then X + Y is independent of only if both X and Y have gamma 

distributions with the same scale parameter. Note that 

P(X„[1]/T„ G du,Tn edt) = nuF ( -——^ — — G du,Tn G dt 



Xi + (X2 + ...+X„) 



and 



Since X„[l]/T„ and T„ are independent, ^^^^^^ is independent of T„. The conclu- 

sion follows by applying Lukacs' theorem to the pair Xi and (X2 + . . . + □ 



2.2. Parallels in PPY |27 . In the subordinator setting of PPY, let Tk denote the 
remaining sum after removing the first k terms of the size-biased permutation of the 
infinite sequence {X^{1), X^{2), . . .). The sequence (To,Ti,...) is a Markov chain with 



stationary transition probabilities |27[ equation 2. a 

t - ti 



P(Ti G dti\To = t) 



t 



Pit - ti 



uit) 



dti. 



Conditionally given To = t^.Ti 
the s.b.p {X[n + l],X[n + 2],. . 



ti, . . . ,T„ = t„, the sequence of remaining terms in 



is distributed as (X^(l), X^(2), 



conditioned on 
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J2i>i-^^i'^) = tn, independent of the first n size-biased picks ^27j Theorem 4.2]. The 
stick-breaking representation in ^ now takes the form 

^ = W,liw., (6) 

where X[k] is the k-th. size-biased pick, and Wi = Wi = 1 — Wi = Proposition 4 
and Corollary [5] parallel the following result. 



Proposition 6 (McCloskey 24 and PPY 27 ). The random variables Tq and Wi, W2, ■ ■ ■ 

in 1^ are mutually independent if and only if Tq has distribution gamma{a, A) for some 
a, A > 0. In this case, the Wi are i.i.d with distribution beta{l, a) for i = 1,2, . . .. 

We take a small detour to explain some results related to Proposition|4]and|6]on charac- 
terization of size-biased permutations. For a random discrete distribution prescribed by its 
probability mass function (P^) = (Pi, P2, . . .) with J2i -Pi = 1; -Pj > 0, let (P^) be the s.b.p. 
of (Pfc). One may ask when is a given distribution (Qk) the s.b.p of some distribution (Pfc). 

Clearly (P^) ^ (P^) for any (Pfc), thus this question is equivalent to characterizing random 
discrete distributions on N which are invariant under size-biased permutation (ISBP). Pit- 
man 28 , Theorem 4] gave a complete answer in terms of symmetry of a certain function 
of the finite dimensional distributions. Furthermore, Pitman proved a complete charac- 
terization of ISBP when P^ can be written as the right hand side of ^ with Wi, W2, ■ ■ ■ 
independent. In this situation, apart from some limiting cases, (P^) is ISBP if and only 
if Wi is distributed as beta{l — a,6 + ia), i = 1, 2, . . . for certain pairs of real numbers 
{a, 9). The two main cases are (0 < a < 1, ^ > —a) and (a = —a < 0, ^ = na) for some 
n = 1,2, .... In both settings, (P^) is known as the GEM[a,9) distribution. The ab- 
breviation GEM was introduced by Ewens, which stands for Grifiiths-Engen-McCloskey. 
The McCloskey case of Proposition |6] is GEM(0,9), and the Patil-Taillie case of Propo- 
sition |4] is GEM(—a,na). The sequence obtained by ranking a GEM{a,9) sequence is 
called a Poisson-Dirichlet distribution with parameters {a, 9) [27]. The GEM distribu- 



tion has a generative description known as the Chinese restaurant process [29i §3] and 



has applications in Bayesian statistics and machine learning, see 29 . 



3. Connections to induced order statistics 

When n i.i.d pairs {Xi,Yi) are ordered by their F-values, the corresponding Xj's are 
called the induced order statistics of the vector Y, or its concomitants. Gordon [17] first 
proved the following result for finite n which shows that finite i.i.d s.b.p is a form of 
induced order statistics. Here we state the infinite sequence version, which is a special 



case of PPY 27, Lemma 4.4]. 



Proposition 7 (PPY |27|). Let {xi,X2, ■ ■ ■) be a fixed positive sequence with finite sum 
t = X^i^i^j- -P'^'^ ^ — 1)2,..., let Yi = ei/xi where are i.i.d standard exponen- 
tials. Let (y'(l), Y'^{2), . . .) be the increasing order statistics of the Yi 's, and let X*{k) 
be the value of the Xi such that Yi is Y^{k). Then {X*{k),k = 1,2,...) is a size- 
biased permutation of {xi,X2, ■ ■ ■) . In particular, the size-biased permutation of a pos- 
itive i.i.d sequence (Xi,...,X„) is distributed as the induced order statistics of the se- 
quence {Yi = ei/Xi, 1 < i < n) for an independent sequence of i.i.d standard exponentials 
(ei,...,en). 
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Proof. Note that the Yi's are independent exponentials with rates Xj. Let a be the random 
permutation such Y„(^i) = Y^{i). Note that X*{k) = Xa{k)- Then 

P(a(l) =t)= P(y, = min{F,-, J = 1, 2, . . .}) = ^, 

thus X*(l) = In general, for distinct indices ^i, . . . ,2^, by the memoryless property 
of the exponential distribution, 

P(or(/c) = ik\a{l) = zi, . . . ,cr(fc) = ik-i) 
= P(y,, =min{F,(,),j > A;}|(t(1) =2i,...,a(A;) =ife„i) 



Induction on k completes the proof. □ 

This proposition readily supplies simple proofs for joint, marginal and asymptotic dis- 
tributions of i.i.d s.b.p, as explicitly computed in this section. For instance, the proof of 
the following nesting property of i.i.d s.b.p, which can be cumbersome, amounts to i.i.d 
thinning. 

Corollary 8. Consider a finite i.i.d s.b.p (X„[l], . . . ,X„[n]) from a distribution F. For 
1 < m < n, select m integers ai < . . . < am by uniform sampling from {1, . . . , n} without 
replacement. Then the subsequence {Xn[aj], I < j < m} is jointly distributed as a finite 
i.i.d s.b.p of length m from F. 

3.1. Joint and marginal distribution revisited. Proposition 7 immediately yields the 
joint distribution of the X„[A;]'s, which is a different formula to (2). 

Proposition 9. (X„[A;], k = 1, . . . ,n) is distributed as the first coordinate of the sequence 
of pairs f/^(A;)), = l,...,n), where f/^(l) > ... > Uj^{n) is a sequence of 

uniform order statistics, and conditional on {Uj;^{k) = u^, I ^ k < n), the X*{k) are 
independent with distribution (Gu^^-), k = 1, . . . ,n), where 

-0(0 H")) 

Here is the Laplace transform of X, that is, (f){y) = e'^^F^dx), (p' its derivative 
and its inverse function. 

Proof. Let [Xi, . . . , X^) be n i.i.d draws from F, Yi = ei/Xi for n i.i.d standard exponen- 
tials ej, independent of the Xj's. Note that {(Xj,l^), 1 < i < n} is an i.i.d sample from 
the joint distribution F[dx)[xe~'^^ dy]. Thus 1^ has marginal density 

P(r. G dy) = -<P\y) dy, < y < oo, (8) 

and its distribution function is Fy = 1 — 0. Given {Yi = yi,l < i < n}, the X*{i) defined 
in Proposition [t] are independent with conditional distribution G{yi,-) where 

Equation ([T]) follows from writing the order statistics as the inverse transforms of or- 
dered uniform variables 

(Y^il), . . . , Y^in)) ^ {Fy\Ui{n)), . . . , Fy\uim ^ {<p-\Ui{l)), ct>-\Ui{n))) 

(10) 
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where ([/^(/c), A; = 1, . . . , n) is an independent decreasing sequence of uniform order sta- 
tistics. Note that the minus sign in dsj) results in the reversal of the sequence Un in the 



second equality of (10). □ 



Corollary 10. For 1 < k < n and < u < 1, let 

be the density of the k-th largest of the n uniform order statistics {Un{i),i = 1, . . . ,n). 
Then 

F{Xn[k] e dx) 



oo 



xF{dx) jQ 
In particular, for the first and last values 



e-^yf^4<P{y))dy. (12) 



F{Xn[n] e dx) 







n e-^y{l-<P{y)r-'dy. 



xF{dx) 

Example 11. Suppose F is gamma{a, 1). Then (f)(y) = (j^)"", and 4>~'^{u) = — 1. 

Hence Gu in ([T]) is 

G^{dx) = — ^e-("~''""-i)^F(rfx) = n-(-+^y-e-^^~"'\ 
^ ^ aui'^+^y^ ^ ' r(a + l) 

That is, Gu is gamma{a + 1,m^^/°). 

3.2. A new beta-gamma identity. When F is gamma{a, X), Lemma [9] gives the fol- 
lowing result, which is a remarkable complement to the Patil-Taillie representation in 
Proposition |4j 

Proposition 12. Suppose F is gamma{a, \) . Then Gu is gamma{a + 1,Am^^/"), and 

{X4k],k = l,...,n)^ ipiikp''^,, k = l,...,n) (13) 

where 71,..., 7„ are i.i.d gamma{a + 1, A) random variables, independent of the se- 
quence of decreasing uniform order statistics (?7;J;(1), . . . , [/^(ra)). Alternatively, jointly 
for k = 1, . . . ,n 

Xr[l]=7l/3an,l 
j^retjpj _ ■y2f3an,lf3an-a,l 

X'^^[n — 1] = 1n-lfian,ll^an-a,l ' ' ' l^2a,l 
XT [it] = ln(3an,l(3an-a,l " " " /Sa.l- 

where the (3an-ia,i for i = 0, . . . ,n — 1 are distributed as beta{an — ia, 1), and they are 
independent of each other and the 7^ 's. 



Proof. The distribution G„ is computed in the same way as in Example 11 and ( 13 ) follows 



readily from Proposition M □ 
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A direct comparison of the two different representations in Proposition |4] and 12 creates 



n distributional identities. For example, the equality X„[l] = X^^''[n] shows that the 
following two means of creating a product of independent random variables produce the 
same result in law: 

— Pan,! 7ct+l,A 

(14) 

where 7r,A and Pa,b denote random variables with distributions gamma{r, A) and beta{a, b), 
respectively. In fact, by comparing the consecutive ratios "^j^^^j^j"^^ for i = 1, . . . ,n — 1, the 
other n — 1 identities all reduce to the following single equation after setting ia — a = b. 

Corollary 13. For a,b, \ > 0, 

1 ~" /3a+l,a+f) o d 1a+l,X n r\ 

n Pa+l,b — — Pa+b,l, [i-O) 

Pa+l,a+b 1a+l,X 

where the random variables f3a+i,a+b, (3a+i,b, f3a+b,i,la+i,x,la+i x ^^^^^ mutually independent, 
beta and gamma distributed with parameters indicated by subscripts. 



The validity of this identity can be checked by comparing moments. Conversely, (14) 



and (15) allow one to go between Proposition 12 and the Proposition 4 



4. ASYMPTOTICS OF THE LAST U FRACTION OF THE SIZE-BIASED PERMUTATION 

In this section we derive Glivenko-Cantelli and Donsker-type theorems for the distri- 
bution of the last u fraction of terms in a finite i.i.d s.b.p. These are special cases of 
more general statements which hold for arbitrary induced order statistics in d dimensions 



see Section 4.3). Features pertaining to i.i.d s.b.p are presented in the following lemma. 



which has an interesting successive sampling interpretation; see Section |4.1[ 
Lemma 14. Suppose F has support on [0, oo) and finite mean. For u G (0, 1), define 

Fu{dx) = F{dx) (16) 

u 

and extend the definition to {0, 1} by continuity, where cj) is the Laplace transform of F 
as in Proposition^ Then F^ is a probability distribution on [0, oo) for all u G [0, 1], and 
Gu in ^ satisfies 

Gu{dx) = xFu{dx)/fiu (17) 
where fiu = J xFu{dx) = z^ti^LJ^ ^ Furthermore, 

/ Gsidx)ds = Fuidx) (18) 
Jo 

for all s G [0, 1]. In other words, the density 

f{u,x) = F^{dx)/F{dx) = M-^e-^''^^'^") 
of Fu with respect to F solves the differential equation 

d r fi XI -xf{u,x) 

— [uf{u,x)\ = (19) 

du flu 

with boundary condition f(l,x) = 1. 

Proof. By direct computation. □ 
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We now state a Glivenko-Cantelli-type theorem which apphes to size-biased permu- 
tation of finite deterministic sequences. Versions of this result are known in the hter- 
ature |4([T7)|T8||32|, see discussions in Section 14. 3l We offer an alternative proof using 
induced order statistics. 

Theorem 15. Let ((x", 1 < i < n), 1 < n) be a deterministic triangular array of positive 
numbers with corresponding c.d.f sequence {En, !<■«.). Suppose 

sup \En{x) - F{x) \ as n oo (20) 

X 

for some distribution F on (0, oo). Let u G (0, 1]. Let En,u{-) be the empirical distribution 
of the first \nu\ terms in a size-biased permutation of the sequence (x", 1 < i < n). Then 
for each 6 G (0, 1), 

sup sup |i?„_„(I) — Fuil)\ as n —)■ oo, (21) 
■ue[<5,i] I 

where I ranges over all subintervals of (0, oo). 

Proof. Define Yj^ = e^/x" for i = 1, . . . ,n where e^'s are i.i.d standard exponentials as in 
Proposition [7j Let Hn be the empirical distribution function (e.d.f) of the l^^'s, 



1 " 

i=l 

Let Jn denote the e.d.f of {x^, ¥■"•). By Proposition |9 



1 " 

EnA^) = ^ -ElK^i}l{>'r<^.T^(i-«)} = - n)]). (22) 

Fix S G (0, 1), and let u G [6, 1]. Let be the Laplace transform of F and J the joint law 
of {X,e/X), where X is a random variable with distribution F, and e is an independent 
standard exponential. Note that ^J{i x [O,0~^('u)]) = Fu(l). Thus 

— Jn{lx[0,H-\l-u)])~^^Ulx[0,r\u)])j 
+ (j^Ul X [0, <p-\u)]) - - J(I X [0, <p-\u)])) . (23) 

\ TIjUL 111 J 

Let us consider the second term. Note that 

J„(I X [O,0-1(m)]) = / e-*<^"'(")l|iei}^n(f^t). 
Jo 

Since E^ converges to F uniformly and e^^'^ ^^'^^ is bounded for all t G (0, oo) and u G [5, 1], 



sup sup 

ue[5,i\ I 



T) 1 

-J„(Ix [0,r'(n)])--J(Ix [0,r'(«)]) 



as n — )■ oo. 



\nu\ u 

Let us consider the first term. Since Jn is continuous in the second variable, it is sufficient 
to show that 

sup \Hn^{l -u)- <p'^{u)\ ^ as n ^ oo. (24) 

«e[5,i] 

To achieve this, let An denote the 'average' measure 

1 _ " _ 

My) ■■= -Y^nyt <y) = i- e-^ydEnix). 
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A theorem of Wellner 36, Theorem 1] states that if the sequence of measures n > 1) 
is tight, then the Prohorov distance between if„ and An converges a.s. to and n — )■ oo. 
In this case, since En converges to F uniformly. An converges uniformly to 1 — 0. Thus 



Hn converges uniformly to 1 — 0, and (24) follows. □ 



In particular, when En is the e.d.f of n i.i.d picks from F, then (20) is satisfied a.s. 



by the Glivenko-Cantelli theorem. Thus Theorem 15 implies that Fu is the limit of the 



empirical distribution function (e.d.f) of the last u fraction in a finite i.i.d s.b.p. 

Corollary 16. Foru G (0, 1], let Fn,u{-) denote the empirical distribution of the last \nu\ 
values of an i.i.d s.b.p. with length n. For each 5 G (0, 1), as n oo, 

sup sup - F„(I)| ^ 0, (25) 

ite[<5,i] I 

where I ranges over all subintervals of (0, oo). 



Under the settings in Theorem [15} one can also derive a Donsker-type result via entropy 
methods described in |35j §2], complementing similar results in the literature j4| §5]. In 
particular, the first term of (|23| depends on the flunctuation of empirical uniform order 



statistics around its limit. Its scaling depends on the rate of convergence of En to F. 



The second term of (23) converges in distribution after rescaled by \/n, by a version of 



Donsker's theorem, to a Brownian bridge. We omit the details. A similar decomposition 



applies in Theorem 17 



4.1. A heuristic interpretation. Since X^'^*'[[MnJ] converges in distribution to Gu for 



u G [0, 1], Corollary 16 lends a sampling interpretation to Lemma 14 Equation (19) has 
the heuristic interpretation as characterizing the evolution of the mass at x over time u 
in a successive sampling scheme. To be specific, consider a successive sampling scheme 
on a large population of N individuals, with species size distribution H. Scale time such 
that at time u, for < n < 1, there are Nu individuals (from various species) remaining 
to be sampled. Let if„ denote the distribution of species sizes at time u, and fix the 
bin {x,x + dx) of width dx on (0, oo). Then NuHu{dx) is the number of individuals 
whose species size lie in the range {x, x + dx) at time u. Thus -^NuHu{dx) is the rate of 
individuals to be sampled from this range of species size at time u. The probability of an 
individual whose species size is in (x,x + dx) being sampled at time u is -nS^r^^rr- As 
we scaled time such that u G [0, 1], in time du we sample N du individuals. Thus 

d , , . xHJdx) 

-NuHu{dx) = -N ^ ' 



du " xHu{dx) 



Let f{u,x) = Hu{dx)/ H{dx), then as a function in u, the above equation reduces to (19). 



4.2. Functional central limit theorem. We now state a functional central limit theo- 



rem for i.i.d s.b.p. This is a special case of the general result of Davydov and Egorov 10 
for induced order statistics. 



Theorem 17 (Davydov and Egorov, [lO]). Suppose the first two moments of F are finite. 
For a distribution H , let ^{H), (j{H) denote its mean and standard deviation. For u G 
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(0, 1], define 



[nu] 



1 " 

-j2x,i{Y,<r\u)) 



Vn[U 



miu) 



fi{Gs) ds = fi{Fu). 

Then ^„ — )■ m, ?]„ — )■ m uniformly on [0, 1], and 

Vn{in -m) ^ a, y/n {r]n - m) ^ r] 
in the Skorokhod topology, where 



a{u) = / a{Gs)dW{s), ri{u) = a{u) + / 171(3) dV{s). 
Jo Jo 

W is a standard Brownian motion, and V is a Brownian bridge, independent of W. 



The difference in ^„ and rjn is the fluctuation of empirical uniform order statistics around 
its limit. Recall the analogue discussed at the end of Theorem 15 The proof of Theorem 
17 can be found in |10j, together with similar statements on functional law of iterated 
logarithm for the processes r]n,^n- 



4.3. Historical notes on induced order statistics and successive sampling. In- 
duced order statistics were first introduced by David jS] and independently by Bhat- 
tacharya [s]. Typical applications stem from modeling an indirect ranking procedure, 
where subjects are ranked based on their y-attributes although the real interest lies in 
ranking their X-attributes, which are difficult to obtain at the moment where the ranking 
is required]^ For example in cattle selection, Y may represent the genetic makeup, for 
which the cattle are selected for breeding, and X represents the milk yields of their female 
offspring. Thus a portion of this literature focuses on comparing distribution of induced 
order statistics to that of usual order statistics [9}[l6|[25|[37j . The most general statement 
on asymptotic distributions is obtained by Davydov and Egorov ^lOj, who proved func- 
tional central limit theorem and functional law of the iterated logarithm for the process 



Sn,u under tight assumptions. Their theorem translates directly into Theorem 17 for finite 

Various versions of results in Section 



i.i.d s.b.p as discussed in Section 4.2 



including 

Theorem 17 are also known in the successive sampling community [4,17,18,32,33 . For 
example, Bickel, Nair and Wang [4] proved Theorem 15 with convergence in probability 
when En and F have the same discrete support on finitely many values. 

5. POISSON COUPLING OF SIZE-BIASED PERMUTATION AND ORDER STATISTICS 

Comparisons between the distribution of induced order statistics and order statistics 
of the same sequence have been studied in the literature [9|[T6||25||37]. However, finite 
i.i.d s.b.p has the special feature that there exists an explicit coupling between these two 
sequences as described in Proposition [7| Using this fact, we now derive Theorem 19 , which 



^One often uses X for the variable to be ordered, and Y for the induced variable, with the idea that 
Y is to be predicted. Here we use X for the induced order statistics since Xn[k] has been used for the 
size-biased permutation. The role of X and Y in our case is interchangeable, as evident when one writes 
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gives a Poisson coupling between the last k size-biased terms X^'^^[l], . . . , X^'^^[/c] and the 
k smallest order statistics X^(l), . . . X^(A;) as n — )• oo. The existence of a Poisson 
coupling should not be surprising, since it is well-known that the increasing sequence of 
order statistics X^(2), . . .) converges to points in a Poisson point process (p.p-p) 

whose intensity measure depends on the behavior of F near the infimum of its support, 
which is in our case. This standard result in order statistics and extreme value theory 



dates back to Renyi 31 , and can be found in 11 . 

Our theorem is closely related to a result in PPY 27, §4]. When (X-''(l), X-''(2), . . .) are 
ranked jumps of a subordinator, these authors noted that one can couple the size-biased 
permutation with the order statistics via the following p.p.p 

N{.) ■.= J2MiX[klYHk)) G ■] = 5^1[(XHfc),n) G ■]. (26) 

k>l k>l 



N(-) has measure m[dxdy) = xe'^^ A{dx)dy. The first expression in (26) defines a scatter 
of {x, y) values in the plane listed in increasing y values, and the second represents the 
same scatter listed in decreasing x values. Since Yli>i-^^i'^) finite, the x-marginal 

of the points in ( [26| has the distribution of the size-biased permutation of the infinite 
sequence (X~''(l), X^(2), . . .) since it prescribes the joint distribution of the first k terms 
X[l], . . . , X[fc] for any finite k. PPY use this p.p.p representation to generalize size- 
biased permutation to /i-biased permutation, where the 'size' of a point x is replaced by 
an arbitrary strictly positive function h{x); see |27j, §4]. 

More generally, one obtains a random permutation of N from ranking points (xj, yi) in a 
Poisson scatter X(-) on (0, oo)^ according to either their x or y values. For z = 1, 2, . . ., let 
(x*) and (y*) denote the induced order statistics of the sequence (xi) and (yi) obtained by 
ranking points by their y and x values in increasing order, respectively. For j,k = 1,2,..., 
define sequences of integers (Kj), (J^) such that x^{Jk) = xl, y^{Kj) = y*; see Figure ij 

Suppose X(-) has intensity measure m such that for all x,y E (0,oo), m{{0,x) x 
(0, oo)) < oo,m((0,oo) X (0,y)) < oo. For j > 1, conditioned on x{j) = x,y* = y, 

K.-l = Poisson (m((x, oo) x (0, y))) + Binomial (j - 1, "^l^"'^? ^/^'^ll I , (27) 

V m((0,x) X (0, oo))y 

where the two random variables involved are independent. Similarly, for k > 1, condi- 
tioned on xl = X, y{k) = y, 

Jk-l = Poisson (m((0, x) x {y, oo))) + Binomial (k - 1, "^((0>^) ^ |'^'^-'M , (28) 



m((0, oo) X (y, oo)) 

where the two random variables involved are independent. When m is a product measure, 
as is the case of i.i.d s.b.p, it is possible to compute the marginal distribution of Kj and 



Jk explicitly for given j, k > 1. We demonstrate such computations in Proposition [20 

Before stating the theorem we need some technical results. The distribution of the last 
few size-biased picks depends on the behavior of F near 0, the infimum of its support. We 
shall consider the case where F has 'power law' near 0, like that of a Gamma distribution. 

Lemma 18. Suppose F is supported on (0, oo) with Laplace transform (j). Let u = (j){y), 
Xu a random variable distributed as Gu{dx) defined in For X,a>0, 

F{x) ~ asx^O, (29) 

1 a + 1) 



^The superscript 'rev' indicates that the order statistics in consideration are arranged in increasing order, 
as opposed to decreasing, which has been the convention of this paper. 
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y2 



{a,b) 




Figure 1. A point scatter on the plane. Here Ji = 5, J2 = 3, J3 = 1, J4 = 
4, J5 = 2, and Ki = 3,K2 = 5, K3 = 2, = 4, = 1. The permutations 
J and K are inverses. Conditioned on x^(2) = a and 7/2 = ^5 the number of 
points lying in the shaded region determines J2 — 1. 



if and only if, 



(p{y) ~ Xyy" as y ^00. 



Furthermore, (29) implies 



u 



'^^""Xu — > gamma{a + 1, A) as u —¥ 0. 



(30) 



(31) 



Proof. The equivalence of (29) and (30) follows from a version of Karamata Tauberian 
Theorem |5( §1.7]. Assume^ and Q. We shall prove (pll) by looking at the Laplace 
transform of the non-size-biased version X'^, which has distribution F„. For 6 > 0, 



POO 

E{ex.p{-eX'J) = / u'^ exp{-yx - Ox) F{dx) = u-^(l){y + 
Jo 



(32) 



Now as ?/ — i- 00 and u = (f){y) — i- 0, for each fixed rj > 0, (30) implies 

(j){y + ri(f){y)~^/'') X'iy + rjX-^y)-" ( A 



E(exp(-r^«-i/'^X;)) 
That is to say 



\ay~a 



\ + v 



U 



'^^"'X'^ gamma{a, A). 



(33) 



Since is different iable, (32) implies E{X'J) = (j)' (y) / (j){y) . Now has an increasing 
derivative 0', thus (30) implies (p'iy) ~ aX°'/y"'^^ as y — 00. Therefore, 



u-'/'^EiX'J 



0(y)i+i/a ' A 
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which is the mean of a gamma{a, A) random variable. Thus the random variables u ^l°'X'^ 
are uniformly integrable, so for any bounded continuous function we can compute 



where 7b,A is a gamma{b, X) random variable. This proves (31). 



□ 



Now suppose F satisfies (|29|) for some A,a > 0. By standard results in order statistics 

(34) 



11, Theorem 2.1.1], as n — > oo, 

(n^/"X;'^^(fc),l < A; < ^ 



ir^'ik) : 1 < < oo) , 
where f.d.d. stands for finite dimensional distribution. Here 

^^^^(k) = Sll^r{a + ly/yX (35) 

for S'(fc) = ei + . . . + efc, where ej are i.i.d standard exponentials. We now present the 
analogue of (35) for the last few size-biased picks X^^'"[l], . . . ,Xl^'^'"[k] and the promised 
Poisson coupling. 



Theorem 19. Suppose that (29) holds for some A,a > 0. 
a) As n ^ oo, 

(n^/"X;""[fc], 1 < A; < n) ^ {C[k] : I < k < oo) , 

with 

r''[k] = Tll;^ki\ 

where Ti^k) = e'l + • • • + e'fc for i-i-d standard exponential r.v. e[, jk, k = I, 
gamma{a + 1, 1), and the {e'j^) and (7^) are independent. 



(36) 

(37) 
n are i.i.d 



h) Let ^'"^'"(1) < ^''^(2) < . . . he the increasing order statistics of the S(''^'"[k] 's. Then the 
EJ''^^{k) 's are jointly distributed as (35), for a sequence (e^) which is implicitly defined in 



terms of the (e'^) and (7^) above, and the f.d.d. convergence in (34) and ^36y hold jointly, 
c) For each n, let Jn = {Jnk, I < k < n) be the permutation of {1, . . . ,n} defined by 

xr[k] = x4J^k). 

As n ^ 00, 



{Jnk, I <k <n 



f.d.d. 



{Jk : 1 < k < 00) 



where is the random permutation 0/ {1, 2, . . . , }, defined by 



(38) 
(39) 



for = 1, 2, . . . , and the f.d.d. convergence in (34), (36), (38) all hold jointly. 



Proof. By Lemma 18, it is sufficient to prove the theorem for the case F is gamma{a, A). 
Part a follows immediately from Proposition 12 and law of large numbers, in the same 



way that part b follows from the proof of (35). For the last part, note that in principle 



everything has been presented as a function of the variables T(fc) and 7^. For a, A > 0, 
define 

^a,A(s) = si/»r(a + l)i/7A. 



Observe that "^a^x has inverse function \E'„ \ 



a, A 



X) 



A"x"/r(a + 1). Rewrite (35) as 



r-(fc) = vi>,,,(%) 



(40) 
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then (37) imphes 
Sk 



Xr^^lkr/Tia + l) = T^,)rk/r{a + l). 



Comparing ( [40| ) and (41) gives a pairing between 5*^ and S(k) via ^{k) and 
we obtain another denmtion of Jk equivalent to (39): 



(41) 
Hence 



Sk = S(j^). 

Let Tj be the T value corresponding to the order statistic S'(j) of the sequence Sk- That 
is, 



where (Kj) is a random permutation of the positive integers. By (41), (J^) is the inverse 
of {Kj). Together with (34) and (36), this implies (38), completing the proof of part c. □ 



Since the T(fc)'s are increasing ordered points of a p.p.p on (0, oo) with rate 1 and 
independent of the 7fc's, the point process on the positive quadrant (0, oo)^ 

iv(-) = J2niSk,T^k)) e ■] = ^i[(%),r,) G ■] (42) 

k j 

is a p.p.p for the measure /i with density 

f^if^ = ir(a + l)i/"(s/t)i/" exp{-(r(a + 1)5^)^/"} (43) 
as at a 



The random permutation and its inverse Kj are obtained from the p.p.p in (42) in the 
(s, t) plane by ranking the points in two different orders according to s-value or t-value. 
Since the projection of a Poisson process is Poisson, the s and t-marginal of /i is just 
Lebesgue measure. Thus, the ranked s-values form a p.p.p with rate 1 which determine 
the ranked ^(j) by the deterministic increasing transformation 

Furthermore, 

r''[k] = r%Jk) = '^aASk) 

comes from the same set of s- values listed in order of increasing t-values. 

Marginal distributions of the random permutation {Jk) and its inverse {Kj) are given 



in (|27|) and (g. Note that for = 1, 2, . . ., 5fc = T(fc)7^/r(a + 1) for i.i.d 7fc distributed 
as gam'ma{a + 1, 1), independent of the sequence {T(^k)), and Tk = T{a + l)S'(fc)e^" for i.i.d 
standard exponentials e^, independent of the sequence {S(^k)) but not of the 7fc's. Thus by 
conditioning on either S(^k) or T(^k), one can evaluate (27) and (28) explicitly. In particular. 



by a change of variable r = T{a + l)^^"'{s/t)^^°', one can write /i in product form. This 
leads to the following. 

Proposition 20. For j > 1, conditioned on S'(j) = s,Tj = r(a+ 1) sr^" for some r > 0, 
Kj — 1 is distributed as 

Poisson{m{s,r)) + Binomial{j — l,p{s,r)) (44) 

with 

m{s,r) = asr'"- / x'^^^e^'^dx, (45) 



and 

p{s,r) = as^'''-^r''-^ I x^-'e''^ dx, (46) 
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where the Poisson and Binomial random variables are independent. Similarly, for k > 1, 
conditioned on T(fc) = t,Sk = tr"/r(a + 1) for some r > 0, Jk — 1 is distributed as 



Poisson{m' {t, r)) + Binomial{k — l,p'{t, r)) 



with 



and 



m'{t, r) = t 



Jr 1 

r(a + i) ; 



p'{t, r) = r(a + l)^-2/a ^^2/a-2^a-l/a f ^a-l^-x 

Jo 

where the Poisson and Binomial random variables are independent. 



(47) 
(48) 
(49) 



Proposition 21 (Marginal distributions of Ki and Ji). Suppose that (29) holds for some 
A > and a = 1. Then the distribution of Ki, the k such that ^(1) = ^.^^'"[k], is a 
mixture of geometric distributions, and so is that for Ji, the j such that ^'"'^^[1] = ^(j). In 
particular, 



P{K^ = k) -- 

where p^ = r/{r + e"*"), qr = 1 — Pr, o,nd 



PrQ^ dr 



P{Ji = j) = f 
Jo 



PrQ: 



(50) 



(51) 



where pr = l/{r + e ^) , = 1 — Pr ■ 



Proof When a = 1, f'^e"^ dt = 6"^ Substitute to (45) and (48) give 



m{s,r) = sr e ^, m'{t,r) = t{r — 1 -\- e 



By a change of variable, (43) becomes 



^{ds dr) 
ds dr 



se 



n{dt dr) 
dt dr 



tre~ 



Thus, conditioned on s and r, Ki — 1 is distributed as the number of points in a p.p.p with 
rate r~^e~^ before the first point in a p.p.p with rate 1. This is the geometric distributions 
on (0, 1, . . .) with parameter pr = 1 / {1 + r~^e~^). Since the marginal density of r is e~^, 



integrating out r gives (50). The computation for the distribution of Ji is similar. □ 



One can check that (50) and (51) sum to 1. We conclude with a 'fun' computation. 



Suppose that (29) holds for some A > and a = 1. That is, F behaves like an exponential 
c.d.f near 0. By Proposition 21 , E{Ji) = 9/4 and E{Ki) = oo. That is, the last size-biased 



pick is expected to be almost the second smallest order statistic, while the smallest order 
statistic is expected to be picked infinitely earlier on in a successive sampling scheme(!). 
The probability that the last species to be picked in a successive sampling scheme is also 
the one of smallest species size is 

hm P(Xr[l] = X„(l)) = P(ril] = e(l)) = PiJi = 1) = P{Ki = 1) 

n— >oo 

du « 0.555229 



re 



dr = 1 — 
r + Jq u — logu 
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